Skip to content

Use acq/rel semantics to pass flags/pointers in getrf_parallel.#2469

Merged
martin-frbg merged 1 commit intoOpenMathLib:developfrom
AGSaidi:acq-rel-2
Mar 6, 2020
Merged

Use acq/rel semantics to pass flags/pointers in getrf_parallel.#2469
martin-frbg merged 1 commit intoOpenMathLib:developfrom
AGSaidi:acq-rel-2

Conversation

@AGSaidi
Copy link
Copy Markdown
Contributor

@AGSaidi AGSaidi commented Feb 29, 2020

The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this reduces variability by 20x on sgesv.goto.

@martin-frbg
Copy link
Copy Markdown
Collaborator

martin-frbg commented Feb 29, 2020

I must admit I am a bit confused about the use of barriers vs. locks on ARMV8 - could you comment on #2363 where a memory barrier was seen to be insufficient due to the weak memory on that platform ?
(Sorry, unintentionally closed your PR due to touchpad malfunction)

@martin-frbg martin-frbg reopened this Feb 29, 2020
@AGSaidi
Copy link
Copy Markdown
Contributor Author

AGSaidi commented Feb 29, 2020

@martin-frbg Is there an official list of the compilers/versions that are supported? Given that manylinux1 is there, while I think it's clearer, this pre-dates support of the c11 atomics. While I think the acq/rel functions make it clearer, if that is requirement I'll probably re-work this to place the existing barrier types in the code in the correct places.

@martin-frbg
Copy link
Copy Markdown
Collaborator

martin-frbg commented Feb 29, 2020

No concise list as such, just a few warnings in the readme for specific platforms. Unfortunately requiring specific minimum versions causes problems for non-priviledged users on sites that use (possibly older) "enterprise" distributions. So if it would be possible to enclose this in appropriate ifdefs (for STDC_VERSION >= 201112L ) that would be preferable.

@AGSaidi AGSaidi force-pushed the acq-rel-2 branch 2 times, most recently from 62b6787 to a344c0e Compare March 4, 2020 06:06
@martin-frbg
Copy link
Copy Markdown
Collaborator

Looks like it's hitting a clang bug now. (I can make it compile by adding an explicit cast but need to convince myself that this is actually still the same thing)

@AGSaidi
Copy link
Copy Markdown
Contributor Author

AGSaidi commented Mar 5, 2020

I don't believe declaring these types as _Atomic is needed. _Atomic means that any operation on them is implicitly made atomic by the compiler (E.g. foo++ means atomically read the current value of foo and increment it by one). If this works with just volatile in old compilers, we're not asking for those kinds of guarantees and the _atomic* builtins don't require this either, "The ‘__atomic’ builtins can be used with any integral scalar or pointer type that is 1, 2, 4, or 8 bytes in length. " Removing the ifs declaring the types with _Atomic makes clang happy.

@martin-frbg
Copy link
Copy Markdown
Collaborator

Alright thanks, so let's nuke the atomics then. BTW something in your force-push seems to have added a lot of unrelated changes that got committed to develop in the meantime. I assume git will manage to sort this out during a merge, but anybody trying to cherry-pick by downloading the current patch file would get a surprise. Have not seen this before, would you happen to know how to rectify it ?

@AGSaidi
Copy link
Copy Markdown
Contributor Author

AGSaidi commented Mar 5, 2020

Yea, this was my doing as I rebased on top of the develop branch. Let me see if I can undo it.

Seems like the #ifdef .. _Atomic should go universally?

@AGSaidi
Copy link
Copy Markdown
Contributor Author

AGSaidi commented Mar 6, 2020

Looks like the arm32 gcc build hung the utest fork test?

The current implementation has locks, but the locks each only
have a critical section of one variable so atomic reads/writes
with barriers can be used to achieve the same behavior.

Like the previous patch, pthread_mutex_lock isn't fair, so in a
tight loop the previous thread that has the lock can keep it
starving another thread, even if that thread is about to write
the data that will stop the current thread from spinning.

On a 64c Arm system this improves performance by 20x on sgesv.goto.
@martin-frbg martin-frbg added this to the 0.3.10 milestone Mar 6, 2020
@martin-frbg
Copy link
Copy Markdown
Collaborator

Looks like it's ready. Thanks a lot for your work on this.

@martin-frbg martin-frbg merged commit dbef479 into OpenMathLib:develop Mar 6, 2020
@AGSaidi
Copy link
Copy Markdown
Contributor Author

AGSaidi commented Mar 6, 2020

Thanks @martin-frbg. Please feel free to ping me if you see other similar issues, i'm happy to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants